Search CORE

177 research outputs found

Multi-Modal 3D Object Detection in Autonomous Driving: a Survey

Author: Ji Jianmin
Mao Qiuyu
Wang Yingjie
Zhang Yanyong
Zhang Yu
Zhu Hanqi
Publication venue
Publication date: 25/06/2021
Field of study

In the past few years, we have witnessed rapid development of autonomous driving. However, achieving full autonomy remains a daunting task due to the complex and dynamic driving environment. As a result, self-driving cars are equipped with a suite of sensors to conduct robust and accurate environment perception. As the number and type of sensors keep increasing, combining them for better perception is becoming a natural trend. So far, there has been no indepth review that focuses on multi-sensor fusion based perception. To bridge this gap and motivate future research, this survey devotes to review recent fusion-based 3D detection deep learning models that leverage multiple sensor data sources, especially cameras and LiDARs. In this survey, we first introduce the background of popular sensors for autonomous cars, including their common data representations as well as object detection networks developed for each type of sensor data. Next, we discuss some popular datasets for multi-modal 3D object detection, with a special focus on the sensor data included in each dataset. Then we present in-depth reviews of recent multi-modal 3D detection networks by considering the following three aspects of the fusion: fusion location, fusion data representation, and fusion granularity. After a detailed review, we discuss open challenges and point out possible solutions. We hope that our detailed review can help researchers to embark investigations in the area of multi-modal 3D object detection

arXiv.org e-Print Archive

OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving

Author: Duan Yifan
Ji Jianmin
Li Heng
Liu Haiyi
Zhang Xinran
Zhang Yanyong
Publication venue
Publication date: 19/09/2023
Field of study

Visual Odometry (VO) plays a pivotal role in autonomous systems, with a principal challenge being the lack of depth information in camera images. This paper introduces OCC-VO, a novel framework that capitalizes on recent advances in deep learning to transform 2D camera images into 3D semantic occupancy, thereby circumventing the traditional need for concurrent estimation of ego poses and landmark locations. Within this framework, we utilize the TPV-Former to convert surround view cameras' images into 3D semantic occupancy. Addressing the challenges presented by this transformation, we have specifically tailored a pose estimation and mapping algorithm that incorporates Semantic Label Filter, Dynamic Object Filter, and finally, utilizes Voxel PFilter for maintaining a consistent global semantic map. Evaluations on the Occ3D-nuScenes not only showcase a 20.6% improvement in Success Ratio and a 29.6% enhancement in trajectory accuracy against ORB-SLAM3, but also emphasize our ability to construct a comprehensive map. Our implementation is open-sourced and available at: https://github.com/USTCLH/OCC-VO.Comment: 7pages, 3 figure

arXiv.org e-Print Archive

Mga Modulates Bmpr1a Activity by Antagonizing Bs69 in Zebrafish

Author: Chen Ji
Dougan Scott
Munisha Mumingjiang
Sun Xiaoyun
Sun Yuhua
Zhang Yanyong
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

MAX giant associated protein (MGA) is a dual transcriptional factor containing both T-box and bHLHzip DNA binding domains. In vitro studies have shown that MGA functions as a transcriptional repressor or activator to regulate transcription of promotors containing either E-box or T-box binding sites. BS69 (ZMYND11), a multidomain-containing (i.e., PHD, BROMO, PWWP, and MYND) protein, has been shown to selectively recognizes histone variant H3.3 lysine 36 trimethylation (H3.3K36me3), modulates RNA Polymerase II elongation, and functions as RNA splicing regulator. Mutations in MGA or BS69 have been linked to multiple cancers or neural developmental disorders. Here, by TALEN and CRISPR/Cas9-mediated loss of gene function assays, we show that zebrafish Mga and Bs69 are required to maintain proper Bmp signaling during early embryogenesis. We found that Mga protein localized in the cytoplasm modulates Bmpr1a activity by physical association with Zmynd11/Bs69. The Mynd domain of Bs69 specifically binds the kinase domain of Bmpr1a and interferes with its phosphorylation and activation of Smad1/5/8. Mga acts to antagonize Bs69 and facilitate the Bmp signaling pathway by disrupting the Bs69-Bmpr1a association. Functionally, Bmp signaling under control of Mga and Bs69 is required for properly specifying the ventral tailfin cell fate.</p

Directory of Open Access Journals

Frontiers - Publisher Connector

Institute of Hydrobiology, Chinese Academy Of Sciences

$P^{3}O$ : Transferring Visual Representations for Reinforcement Learning via Prompting

Author: Chu Xiaomeng
Duan Yifan
Ji Jianmin
Peng Jie
You Guoliang
Zhang Yanyong
Zhang Yu
Publication venue
Publication date: 27/03/2023
Field of study

It is important for deep reinforcement learning (DRL) algorithms to transfer their learned policies to new environments that have different visual inputs. In this paper, we introduce Prompt based Proximal Policy Optimization (

P^{3}O

), a three-stage DRL algorithm that transfers visual representations from a target to a source environment by applying prompting. The process of

P^{3}O

consists of three stages: pre-training, prompting, and predicting. In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged. We implement

P^{3}O

and evaluate it on the OpenAI CarRacing video game. The experimental results show that

P^{3}O

outperforms the state-of-the-art visual transferring schemes. In particular,

P^{3}O

allows the learned policies to perform well in environments with different visual inputs, which is much more effective than retraining the policies in these environments.Comment: This paper has been accepted to be presented at the upcoming IEEE International Conference on Multimedia & Expo (ICME) in 202

arXiv.org e-Print Archive

Managing the Mobility of a Mobile Sensor Network Using Network Dynamics

Author: Ke Ma
W. Trappe
Yanyong Zhang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

Author: Deng Jiajun
Hu Jinshui
Ji Jianmin
Li Yao
Liu Cong
Ouyang Wanli
Wang Yingjie
Zhang Yanyong
Zhang Yu
Publication venue
Publication date: 02/06/2023
Field of study

LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints. Though seemingly natural, how to efficiently combine them for improved feature representation is still unclear. The main challenge arises from that Radar data are extremely sparse and lack height information. Therefore, directly integrating Radar features into LiDAR-centric detection networks is not optimal. In this work, we introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects. Technically, Bi-LRFusion involves two steps: first, it enriches Radar's local features by learning important details from the LiDAR branch to alleviate the problems caused by the absence of height information and extreme sparsity; second, it combines LiDAR features with the enhanced Radar features in a unified bird's-eye-view representation. We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects. Notably, Radar data in these two datasets have different formats, which demonstrates the generalizability of our method. Codes are available at https://github.com/JessieW0806/BiLRFusion.Comment: accepted by CVPR202

arXiv.org e-Print Archive